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Abstract.  Using  stochastic  flows  a  minimum  principle  is  obtained  when  a  diffusion  is 
controlled  using  stochastic  open  loop  controls.  An  equation  for  the  adjoint  process  is  then 
derived  using  an  explicit  formula  for  the  integrand  in  a  certain  stochastic  integral. 

f 


1.  Introduction. 

There  have  been  many  proofs  of  minimum  principles  in  stochastic  control.  For  a  small 
sample  see  the  works  of  Kushner  [15],  Bismut  [2],  Haussmann  [10],  [ll],  [12],  Davis  and 
Varaiya  [6],  and  the  book  by  Elliott  [8].  In  this  paper  we  consider  a  diffusion  and  stochastic 
open  loop  controls,  that  is,  controls  which  are  adapted  to  the  filtration  of  the  driving 
Brownian  motion  process.  For  such  controls  the  dynamical  equations  have  strong  solutions, 
and  the  results  on  the  differentiability  of  the  solution,  due  originally  to  Blagovescenskii  and 
Freidlin  [l],  cam  be  applied.  The  work  of  Kunita  [14]  and  Bismut  [2]  on  stochastic  flows 
enables  the  variation  in  the  expected  cost,  due  to  a  perturbation  of  the  optimal  control,  to 
be  calculated  explicitly.  The  minimum  principle  follows  by  differentiating  this  quantity. 

If  the  optimal  control  is  Markov  the  stochastic  integral  representation  result  of  {9j  is 
applied  to  give  an  expression  for  a  quantity  associated  with  the  adjoint  process.  Stochastic 
calculus  is  then  used  to  derive  the  equation  satisfied  by  the  adjoint  process. 
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2.  Dynamics. 


Suppose  the  state  of  a  system  is  described  by  a  stochastic  differential  equation: 

d£t  =  f(t,  6, u)dt-t-ff(t,6)du>t 

6  e  R*,  6  =  20,  o  <  t  <  T.  (2.1) 

The  control  parameter  u  will  take  values  in  a  compact  subset  U  of  some  Euclidean  space  Rk . 
We  shall  make  the  following  assumptions. 

An  f  :  [0,T]  x  Rd  xU  ->  Rd  is  Borel  measurable,  continuous  in  u  for  each  (t,z), 
continuously  differentiable  in  x  and  for  some  constant  K 

(1  +  M)-1  l/(M»ti)|  +  |/*(*>x»u)|  <  Ki. 

An  g  :  [0,  T\xRd  — ►  Rd®Rn  is  a  matrix  valued,  Borel  measurable  function,  continuously 
differentiable  in  z,  and  for  some  constant  K2 


+  |ff*(*,x)|  <  k2. 

The  columns  of  g  will  be  denoted  by  gW  for  k  =  1, . . . ,  n. 

A^:  w  =  (w1,. . .  ,wn)  is  an  n-dimensional  Brownian  motion  on  a  probability  space 
(fl ,F,P)  with  a  right  continuous,  complete  filtration  {Ft},  0  <  t  <  T. 

DEFINITION  2.1.  The  set  of  admissible  controls  U_  will  be  the  Ft -predictable  functions  on 
(0,7’j  x  fl  with  values  in  U.  These  are  sometimes  called  *; stochastic  open  loop’  controls,  [ 3 ]. 


REMARKS  2.2.  For  each  u  €  U  there  is,  therefore,  a  strong  solution  of  (2.1),  and 
we  shall  write  (z)  for  the  solution  trajectory  given  by 

Ct,t  (*)  =  x  +  jf  /(r.C,f  (x),«r)dr  +  J'  g[r,  £*r  {x))dwT.  (2.2) 

Then,  because  u  is  a  (predictable)  parameter,  the  result  of  Blagovenscenskii  and  Freidlin 
[1]  extends  to  this  situation,  so  the  Jacobian  ^  (z)  =  D*t  exists  and  is  the  solution  of 

c.“,  =  /  +  /V<(r,C,.W,«,)^V'  +  £  /'??’  (2.3) 

J*  *=1  Jt 


Here  I  is  the  d  x  d  identity  matrix.  In  fact,  if  the  coefficients  /  and  g  axe  Ck  the  map 
x  —  ( x )  »  Ck~l  . 

Consider  the  matrix  valued  process  H  defined  by: 

Kt  =  ‘  -  f  Kr  (/«(«-.  C,  (*),«,)-£  (r,  M)  V 

Jt  Jt=l 

-E  f 

*=i  •'* 


(2.4) 


Then  using  the  Ito  rule  we  see  d(H^  jD“t )  =  0  and  H,,  D*s  =  /,  so  £T“t  =  (PJ*t )  1 . 

Write  ||f“(a:o)llt  =  sup  |£q4  (xo)|.  Then,  as  in  Lemma  2.1  of  [12],  for  any  p, 
o  <»<t  ' 

1  <  p  <  oo,  using  Gronwall’s  and  Jensen’s  inequalities 


Ilf  Mil?-  <  C(l  +  |*o|'  +  |  [  9(r,(SAx0))d«,r\r) 

almost  surely  for  some  constant  C.  Therefore,  using  Burkholder’s  inequality  and  hypoth¬ 
esis  A2,  ||£u(x0)||r  is  in  IP  for  all  p,  1  <  p  <  oo.  Write 

IPlr=  sup  I 

o<«<r 

IMHr-  sup  1115,1. 

0<i<T 

Then,  because  f(  and  are  bounded,  an  application  of  Gronwall’s,  Jensen’s  and  Burk¬ 
holder’s  inequalities  again  implies 

||-Z2“||r  and  H-H^Ur  axe  in  IP  for  all  p,  1  <  p  <  oo. 

COST  2.3.  Suppose  for  simplicity  that  the  cost  associated  with  the  process  is  purely 
terminal  and  given  by  a  bounded  C 2  function 

c(£o,r(zo)). 


We  suppose  |c(z)j  +  |e,(x)|  +  |c«(z)[  <#3(1  +  \x\q)  for  some  q  <  oo. 


The  expected  cost  if  a  control  u  E  U  is  used  is,  therefore, 


j(«)  =  (*>))). 


We  shall  suppose  there  is  an  optimal  control  u*  E  U  so 


J(u*)<J(u)  for  all  uEU. 


NOTATION  2.4.  Ifu*  is  an  optimal  control  write  fort*',  D*  for  Du‘  etc. 
REMARKS  2.5.  Consider  a  d-dimensional  semimartingale  of  the  form 


zt  =  zt  +  At 


where  A  is  a  predictable  bounded  variation  process.  Then  Kunita’s  formula  [14]  for  the 
composition  of  processes  can  be  applied,  (see  also  Bismut  [5]),  and  we  have 


C,t  (*<)  =  +  J  f{r,  C,r  (*r)»«;)di 


/  ~^^dAr  +  ^jt  9{k)(r't',r(Zr))dw?-  (2-5) 


r 

*»  rt 


+ 


fc=l 


DEFINITION  2.6.  Consider  perturbations  of  the  optimal  control  u*  of  the  following  kind: 
For  s  €  [0,T],  h  >  0  such  that  0<s<s-{-h<T,  and  A  E  Ft  define,  for  any  other 
admissible  control  u  E  U_, 


u*(t,w)  if  (f,ty)  £  [s,s  +  fi]  x  A 


(.  x  /  «*(*>«>)  if  (t,  i 

u(£,  tU|  =  < 

l  u(t,u/)  if  (t,u/)  E  [s,s  +  h]  x  A. 


Applying  (2.5)  we  have,  similarly  to  Theorem  5.1  of  [4],  the  following  result. 


THEOREM  2.7.  For  the  perturbation  u  of  u*  consider  the  process 
zt  =  x  + 


/  (/(r^.V(^),ur)-/(r,^(^),u;))dr.  (2.6) 


Then  the  process  £*t  (zt)  is  indistinguishable  from  £“t  (i). 
PROOF.  Substituting  (2.6)  in  (2.5)  we  see 

C,t  te)  =  X  +  J  f(r ,  Ct,r  («r),«;)rfr 

+  J'  9{r,  C.,r  (Zr))dwr 
=  x  +  J  f(ritl,r(zr),Ur)dr  +  j'  g(r,  Cs,r  {xr))dwr. 


(/(»•.  tl,r  {*r),UT)  ~  f(r,  C,r  (*r  W))</r 


However,  the  solution  to  (2.2)  is  unique  so  t  (zr)  =  (x). 

REMARKS  2.8.  Note  that  u(t)  =  u*(f)  if  t  >  s  +  h  so  zt  =  zt+k  if  t  >  s  -f  h. 
Therefore 

C„M  =  «,(**+*)  =  f.Vw  K.V*  (*» 


if  t  >  s  +  h. 


3.  A  Minimum  Principle. 


J(u‘)  =  E|c(«ir(io))] 

=  £(c(£)r  (x))]  where  x  =  fo,<  (xo), 
because,  by  uniqueness,  £oj.  (xo)  =  Ce  T  (x).  Similarly, 

J(u)  =  £|c(f0V  (xo))) 

=  E[c(e,v  (x))] 

=  £(»(£.>  (*.+»))]• 

Therefore, 

J(u)  -  J(u-)  =  £[c(e,>  (*,+» ))  -  c(£,>  (*))]. 

Because  £*T  (•)  is  differentiable  this  is 

=  E[j‘*h ^(c,TM)a(:gxM  (/m:, r(xr),ur)-/(r,e:,r(xr),u;))*]. 

(3.1) 

This  gives  an  explicit  formula  for  the  change  in  the  cost  resulting  from  a  ‘strong’  variation 
in  the  optimal  control.  It  involves  only  a  time  integration.  The  only  remaining  problem  is 
to  justify  the  differentiation  of  the  right  hand  side  of  (3.1). 

Write  r(s,r,xr)  =  ct(€*r  • 

Then 

J(u)-  J(u*)  =  ^  f;[(r(s,r,zr) -r(s,r,x))(/(r,  £|r(zr),  ur)  -  /(r,  £,,(*,),  <))]« 

+  L  -r(r,r,*))(/(r,  C»,r  (2r),  ur)-/(r,  £>r(zr),  <))] 

+  J  £[r(r,r,a:)(/(r,  ^(z,),  ur)  -  /(r,  C,r(2r),  ur*) 

-  /(r»  £«*,r(*).  «r)  +  /(r»  (,‘r(z).  Ur))]<*r 

+  J  ^[r(r,r,x)(/(r,  Co.r^o).  «r)  -  /(*\  Co,r  (*o),  <))]* 

=  h(h)+I2(h)  +  h(h)  +  I4(h),  say. 


IMMI  <  Ktr  £[|r(s,r,zr)  -r(.,r,i)|(l  +  ||f(*o)IU»)]* 
<KAh  sup  ^[|r(s,r,2,.)  -  T(s,r, x)|(l  +  ||£u(xo)||«+/i  )1 

i<r<s+h  L  J 

|/j(*)l  <  Ks  f  B[|r(*,r,*)  -  r(r,r,x)|(l  +  ||f*(*o)ll<+*  )]* 
<Ksh  sup  £[|r(s,r,zr)  -  T(r,r, x)|(l  +  ||£tt(x0)||,+/i  1 

»<r<»+h  t  ■* 

|/sW|  <  **  J‘+k  ^[|r(rfr,*)|  ||x  —  2r||]<fr 
<K6h  sup  ^[|r(r,r,x)j  ||x  - 


The  differences  |T(s,r,2r)  —  T(.s,r,x)|,  |T(s,r,x)  — T(r,r,x)|  and  ||x—  x||*+a  are  all  uniformly 
bounded  in  some  IP ,  p  >  1,  and 

lim  |T(s, r,2f)  —  T(s,r,x)|  =  0  a.s. 
lim  ir(s,r,x)  —  P(r,r,x)l  =  0  a.s. 


lim  1 1 x  -  *.||#+a  =  0. 

h—*0 


Therefore, 


Jim  ||r(s,r,xr)  -  r(s,r,x)||p  =  0 
lim  ||r(s, r, x)  -T(r,r,x)||p  =  0 
and  lim  ||(||x  -  z||»+/i  )||p  =  0  for  some  p. 

Consequently,  lim  h~x  h(h)  =  0,  for  k  =  1,2,3. 
h—*0 

The  only  remaining  problem  concerns  the  differentiability  of 

M*>  =  J  £[r(r,r,x)(/(r,  «,,(*»),  «,)  -  /( r,  {£,(*>).  <))]*. 

The  integrand  is  almost  surely  in  /^([O,  T])  so  lim  h~x  1 '4 (/*.)  exists  for  almost  every  s  € 

h— O 

[0,  r].  However,  the  set  of  times  {s}  where  the  limit  may  not  exist  might  depend  on  the 


control  u.  Consequently  we  must  restrict  the  perturbations  u  of  the  optimal  control  u*  to 
perturbations  from  a  countable  dense  set  of  controls.  In  fact: 


1) 


2) 


3) 


Because  the  trajectories  are,  almost  surely,  continuous,  Fp  is  countably  generated 
by  sets  {Aip},  i  =  1,2, ...  for  any  rational  number  p  £  [0, T].  Consequently  Ft  is 
countably  generated  by  the  sets  {Atp},  r  <  t. 

Let  Gt  denote  the  set  of  measurable  functions  from  (fi, Ft)  to  U  C  Rk .  (If  u  £  U 
then  u{t,w)  £  Gt-)  Using  the  Ll-norm,  as  in  [7],  there  is  a  countable  dense  subset 
Hp  =  {ujp}  of  Gp ,  for  rational  p  £  [0,r].  If  Ht  =  \J  Hp  then  Ht  is  a  countable 

P<t 


dense  subset  of  Gt.  If  uJP  £  Hp  then,  as  a  function  constant  in  time,  uJP  can  be 
considered  as  an  admissible  control  over  any  time  interval  [t,T]  for  t  >  p. 

The  countable  family  of  perturbations  is  obtained  by  considering  sets  A{p  £  Ft, 
functions  Ujp  £  Ht,  where  p  <t,  and  defining  as  in  3.1 

f  ii'(s,ti;)  if  («, tu)  g  [t,T]  x  Aip 


,to)  if  (s,ty)  £  [t,T]  x  ylt/). 


Then  for  each  i,j,p 

h  1  /  £[r(r>r» *)(/(*■»  £o,r  (*o),  «;>)  -  /(r,  eo,r  (*o),  «*))]dr  (3.2) 

exists  and  equals 


£[r(s,s,a;)(/(s,  ^(z0),  ujp)  -  f{s,  (o,*(xo),  «*))/^p] 
for  almost  all  s  £  [0,  T\. 

Therefore,  considering  this  perturbation  we  have 

lim  =  £[r(«,..x)(/(S,  «,,(!„),  u„)-f(s,  £5,,(x0),  u ’))/*, 

>  0  for  almost  all  s  £  (0,  T). 


Consequently  there  is  a  set  S  C  [0,  T]  of  zero  Lebesgue  measure  such  that,  if  s  ^  S,  the 
limit  in  (3.2)  exists  for  all  i,j,p,  and  gives 

£[r(s,s,x)(/(s,  *o,.(*o).  u}P)-f(s,  £o,.(zo),  «*))/*„]  >0. 


Using  the  monotone  class  theorem,  and  approximating  an  arbitrary  admissible  control 
u  £  U  we  can  deduce  that  if  s  S 

£[r(s,s,x)(/(s,  £o,*(zo),  «)  -  f{s,  toiB{x0),  u*))J>i]  >0  for  any  u  6  U  and  A  €  Ft. 

(3.3) 

Write 

p.(i)  =  £[C{(f5J.(xo))^^|F.]  =J5(r(.,«,x)|F.]  (3.4) 

where,  as  before,  x  =  ^o,«  (zo)*  Then  p»(x)  is  the  adjoint  variable  and  we  have  in  (3.3) 
proved  the  following  minimum  principle: 

THEOREM  5. 1.  If  u*  €  U  is  an  optima.!  control  there  is  a  set  S  C  [0,  T]  of  zero  Lebesgue 
measure  such  that  if  s  (fc  S 

P*{x)f(s,x,u)  <  pe(x)/(s,x,u)  a.s. 

That  is,  the  optimal  control  u*  almost  surely  minimizes  the  Hamiltonian  and  the  adjoint 
variable  is  ps  (x) . 

REMARKS  3.2.  Under  certain  conditions  the  minimum  cost  attainable  under  the 
stochastic  open  loop  controls  is  equal  to  the  minimum  cost  attainable  under  the  Markov, 
feedback  controls  of  the  form  u(s,  £o,«(xo))*  See  for  example  [2],  [10].  If  um  is  a  Markov 
control,  with  a  corresponding,  possibly  weak,  solution  trajectory  (uu ,  then  um  can  be 
considered  as  a  stochastic  open  loop  control  ujvi'(iw)  by  putting 

«A/(«>)  =  um(s,$"(zo,u>)). 

This  means  the  control  in  effect  ‘follows*  its  original  trajectory  £UM  than  any  new  trajectory. 
That  is  the  control  is  similar  to  the  adjoint  strategies  considered  by  Krylov  [13].  The 
significance  of  this  is  that  when  we  consider  variations  in  the  state  trajectory  £,  and 
derivatives  of  the  map  x  — ►  *  (x),  the  control  does  not  react,  and  so  we  do  not  introduce 
derivatives  in  the  u  variable. 


If  the  optimal  control  u*  is  Markov  the  process  £*  is  Markov  and 


pe(x)  =  jE[r(s,s,z)  |  Ft 


=  £[r(s,s,x)  I  *). 


LEMMA  3.3.  Suppose  the  optimal  control  u*  is  Markov  and  write 


v(.,*)-^«(eST(*o))i#vi 

=  Ei,*  W$o,r(Io))l- 

Then  pt(x )  is  the  gradient  Vx(s,  x). 

PROOF.  P(s,  x)  =  (z))  |  F,]  and  because  the  Jacobian  exists  the 

result  follows  by  differentiating  in  x. 


4.  The  Adjoint  Process. 


Suppose  the  optimal  control  u*  is  Markov, 
considered  as  an  open  loop  control.  The  Jacobian 
higher  derivatives. 


As  noted  above,  u*  can  and  will  be 
exlsts’  M  does 


and 


THEOREM  4.1.  Suppose  the  optimal  control  u*  is  Markov  and  the  second  derivative 
Vxx  (s,  i)  exists.  Then 


pt(x)  =  E[c((t ;o>r  (xo))Do,t  }  ~  f  Pr(( o,r  (xo))/(  (r,  (x0),u*)dr 

Jo 

+  [  Vzx{r,totT{xo))g{r,CoiT{xo))dwr 

Jo 

-  f  Vxx  (r,  £o,r  (*o))s(r,  £o,r  (*o))^  (r,  &,r  (zo))dr. 

Jo 


PROOF.  Write  (r)  for  /*(r,  £o,r  (xo),«?)  and  ff(r)  for  ff(r,  £o,r  (x0)),  etc.  By  unique¬ 
ness  of  the  solutions  to  (2.1) 


«,r(*o)  =  f.V  («,.(*«)) 


(4.1) 


so,  differentiating, 

A),r  =  Dtp  Do,$  (4.2) 

where  Dop  =  DqT  etc.  (without  the  *). 

From  (3.4)  and  (3.5) 


so  from  (4.2) 


P,(x)  =  £{c((£o,r (*o))^«,r  I  F»] 

P»(x)D0<l  =  E[c^o,t  ( xo))Dq,t  |  Ft) 


(4.3) 


and  this  is  a  (P,  {Ft})  martingale.  Write  x  =  £o,*(zo)»  D  =  DoiS .  From  the  martingale 
representation  result  [9],  the  integrand  in  the  representation  of  p${x)D  as  a  stochastic 


integral  is  obtained  by  the  Ito  rule,  noting  that  only  the  stochastic  integral  terms  will 
appear.  These  involve  the  derivatives  in  x  and  D.  Therefore 


p,(x)D  =  E[ct  (to?  {x0))Do?  )  +  f  Vxx  (r,  ^,r  (xo)Mr)dti/rDo,r 

Jo 

+  /  PrUo.rM)^  {r)D0trdw^.  (4.4) 

fc=l 

Recall  from  (2.4)  that  Hoit  =  D~l  so  forming  the  product  of  (2.4)  and  (4.4),  using  the  Ito 


P»{x)  =  (p,  (x)D)H0)t 


=  EMfo?  M)Do,T  ]  -  [‘  PrUo,r  (*o))/<(r)<«r 

Jo 

f  Pr  Uo.r  (^o))ff^  (r)dwt  +  f  Pr  tfo.r  (xo))(ff(k)  (r))2<ir 

Jfc=l  ■'°  k- 1  "'0 

+  /  Vxx  (r,  £o,r  (x0))^(r)dtwr  +  ^  /  o,r  (a=o))ff^fc)  (r)<£tw* 

*'°  •'O 

-  it,  \  Vxx  (r,  €5fl>  (xo))ff(r)^*}  (r)dr  -  f  PrUo.r  (xo))(ff(k)  (r))2dr 
k=  i  •/o  fc=i  ■/° 

=  £(<* (£o,r  (*o))A>,r  J  -  /  Pr{£o^r  M)fdr)dr 

Jo 

+  /  V**  (r»  £o,r  (2o))j7(r)dwr  f  V%x  (r’  ttr  (*o))ff(r)pf }  (r)dr 

io  Jk=1  -'o 

so  establishing  the  result. 

This  verifies  by  a  simple,  direct  method  the  formula  of  Haussman  (10]. 
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